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In this paper, we study the nonparametric maximum likelihood 
estimator for an event time distribution function at a point in the 
current status model with observation times supported on a grid 
of potentially unknown sparsity and with multiple subjects sharing 
the same observation time. This is of interest since observation time 
ties occur frequently with current status data. The grid resolution 
is specified as cn -7 with c > being a scaling constant and 7 > 
regulating the sparsity of the grid relative to n, the number of sub- 
jects. The asymptotic behavior falls into three cases depending on 7: 
regular Gaussian-type asymptotics obtain for 7 < 1/3, nonstandard 
cube-root asymptotics prevail when 7 > 1/3 and 7=1/3 serves as 
a boundary at which the transition happens. The limit distribution 
at the boundary is different from either of the previous cases and con- 
verges weakly to those obtained with 7 € (0, 1/3) and 7 £ (1/3, 00) 
as c goes to 00 and 0, respectively. This weak convergence allows us 
to develop an adaptive procedure to construct confidence intervals for 
the value of the event time distribution at a point of interest with- 
out needing to know or estimate 7, which is of enormous advantage 
from the perspective of inference. A simulation study of the adaptive 
procedure is presented. 

1. Introduction. The current status model is one of the most well-studied 
survival models in statistics. An individual at risk for an event of interest is 
monitored at a random observation time, and an indicator of whether the 
event has occurred is recorded. An interesting feature of this kind of data 
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is that the underlying event time distribution, F, can be estimated by its 
nonparametric maximum likelihood estimator (NPMLE) at only ?i 1//3 rate 
when the observation time is a continuous random variable. Under mild 
conditions on F, the limiting distribution of the NPMLE in this setting 
is the non-Gaussian Chernoff distribution: the distribution of the unique 
minimizer of {W(t) + t 2 :t £ M}, where W(t) is standard two-sided Brow- 
nian motion. This is in contrast to data with right-censored event times 
where F can be estimated nonparametrically at rate \fn and is "pathwise 
norm-differentiable" in the sense of van der Vaart (1991), admitting regu- 
lar estimators and normal limits. Interestingly, when the observation time 
distribution has finite support, the NPMLE for F at a point asymptoti- 
cally simplifies to a binomial random variable and is also y/n estimable and 
regular, with a normal limiting distribution. 

An extensive amount of work has been done for inference in the cur- 
rent status model under the assumption of a continuous distribution for the 
observation time: the classical model considers n subjects whose survival 
times T\,T2, . . . ,T n are i.i.d. F and whose inspection times Xi,X2, . . . ,X n 
are i.i.d. with some continuous distribution, say G; furthermore, in the ab- 
sence of covariates, the X^s and Tj's are considered mutually independent. 
The observed data are {Aj, Xi}™ =1 , where Aj = l{Tj < Xi}, and one is inter- 
ested in estimating F as n goes to infinity. More specifically, for inference on 
the value of F at a pre-fixed point of interest under a continuous observation 
time, see, for example, Groeneboom and Wellner (1992), who establish the 
convergence of the normalized NPMLE to Chernoff 's distribution; Keiding 
et al. (1996); Wellner and Zhang (2000), who develop pseudo-likelihood es- 
timates of the mean function of a counting process with panel count data, 
current status data being a special case; Banerjee and Wellner (2001) and 
Banerjee and Wellner (2005), who develop an asymptotically pivotal like- 
lihood ratio based method; Sen and Banerjee (2007), who extend the re- 
sults of Wellner and Zhang (2000) to asymptotically pivotal inference for F 
with mixed-case interval-censoring; and Groeneboom, Jongbloed and Witte 
(2010) for smoothed isotonic estimation, to name a few. 

However, somewhat surprisingly, the problem of making inference on F 
when the observation times lie on a grid with multiple subjects sharing the 
same observation time has never been satisfactorily addressed in this rather 
large literature. This important scenario, which transpires when the inspec- 
tion times for individuals at risk are evenly spaced, and multiple subjects can 
be inspected at any inspection time, is completely precluded by the assump- 
tion of a continuous G, as this does not allow ties among observation times. 
Consider, for example, a tumorigenicity study where a large number of mice 
are exposed to some carcinogen at a particular time, and interest centers 
on the time to development of a tumor. A typical procedure here would 
be to randomize the mice to be sacrificed over a number of days following 
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exposure; so, one can envisage a protocol of sacrificing a fixed number m of 
mice at 24 hrs post-exposure, another m mice at 48 hours and so on. The 
sacrificed mice are then dissected and examined for tumors, thereby lead- 
ing to current status data on a grid. A pertinent question in this setting is: 
what is the probability that a mouse develops a tumor by an M-day period 
after exposure? This involves estimating F(24M), where F is the distribu- 
tion function of the time to tumor-development. Similar grid-based data can 
occur with human subjects in clinical settings. 

In this paper we provide a clean solution to this problem based on the 
NPMLE of F which, as is well known, is obtained through isotonic regres- 
sion [see, e.g., Robertson, Wright and Dykstra (1988)]. The NPMLE of F in 
the current status model (and more generally in nonparametric monotone 
function models) has a long history and has been studied extensively. In ad- 
dition to the attractive feature that it can be computed without specifying 
a bandwidth, the NPMLE of F(xq) (where xq is a fixed point) attains the 
best possible convergence rate, namely ra 1 / 3 , in the "classical" current status 
model with continuous observation times, under the rather mild assumption 
that F is continuously differentiable in a neighborhood of xq and has a non- 
vanishing derivative at xq. This rate cannot be bettered by a smooth estimate 
under the assumption of a single derivative. As demonstrated in Groene- 
boom, Jongbloed and Witte (2010), smoothed monotone estimates of F can 
achieve a faster n 2//5 rate under a twice- differentiability assumption on F; 
hence, the faster rate requires additional smoothness. However, as we wish to 
approach our problem under minimal smoothness assumptions, the isotonic 
NPMLE is the more natural choice. (Smoothing the NPMLE would intro- 
duce an exogenous tuning parameter without providing any benefit from the 
point of view of the convergence rate.) 

The key step, then, is to determine the best asymptotic approximation to 
use for the NPMLE in the grid-based setting discussed above. If, for example, 
the number of observation times, K, is far smaller than n, the number of 
subjects, the problem is essentially a parametric one, and it is reasonable to 
expect that normal approximations to the MLE will work well. On the other 
hand, if K = n, that is, we have a very fine grid with each subject having their 
own inspection time, the scenario is similar to the current status model with 
continuous observation times where no two inspection times coincide, and 
one may expect a Chernoff approximation to be adequate. However, there 
is an entire spectrum of situations in between these extremes depending on 
the size of the grid, K, relative to n, and if n is "neither too large, nor too 
small relative to K," neither of these two approximations would be reliable. 

Some work on the current status model or closely related variants un- 
der discrete observation time settings should be noted in this context. Yu 
et al. (1998) have studied the asymptotic properties of the NPMLE of F 
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in the current status model with discrete observation times, and more re- 
cently Maathuis and Hudgens (2011) have considered nonparametric infer- 
ence for (finitely many) competing risks current status data under discrete or 
grouped observation times. However, these papers consider situations where 
the observation times are i.i.d. copies from a fixed discrete distribution (but 
not necessarily finitely supported) on the time-domain and are therefore not 
geared toward studying the effect of the trade-off between n and K, that is, 
the effect of the relative sparsity of the number of distinct observation times 
to the size of the cohort of individuals on inference for F. In both these 
papers, the pointwise estimates of F are y/n consistent and asymptotically 
normal; but as Maathuis and Hudgens (2011) demonstrate in Section 5.1 of 
their paper, when the number of distinct observation times is large relative 
to the sample size, the normal approximations are suspect. 

Our approach is to couch the problem in an asymptotic framework where K 
is allowed to increase with n at rate n 7 for some < 7 < 1 and study the 
behavior of the NPMLE at a grid-point. This is achieved by considering the 
current status model on a regular grid over a compact time interval, say 
[a, b], with unit spacing 5 = 5 n = cn~^, c being a scale parameter. It will be 
seen that the limit behavior of the NPMLE depends heavily on the "spar- 
sity parameter" 7, with the Gaussian approximation prevailing for 7 < 1/3 
and the Chernoff approximation for 7 > 1/3. When 7 = 1/3, one obtains 
a discrete analog of the Chernoff distribution which depends on c. Thus, 
there is an entire family of what we call boundary distributions, indexed 
by c, say {F c : c > 0}, by manipulating which, one can approach either the 
Gaussian or the Chernoff. As c approaches 0, F c approximates the Chernoff 
while, as c approaches 00, it approaches the Gaussian. This property allows 
us to develop an adaptive procedure for setting confidence intervals for the 
value of F at a grid-point that obviates the need to know or estimate 7, the 
critical parameter in this entire business as it completely dictates the ensu- 
ing asymptotics. The adaptive procedure involves pretending that the true 
unknown underlying unknown 7 is at the boundary value 1/3, computing 
a surrogate c, say c, by equating (b — a) /K, the spacing of the grid (which 
is computable from the data), to cn" 1 / 3 and using to approximate the 
distribution of the appropriately normalized NPLME. The details are given 
in Section 4. It is seen that this procedure provides asymptotically correct 
confidence intervals regardless of the true value of 7. Our procedure does 
involve estimating some nuisance parameters, but this is readily achieved 
via standard methods. 

The rest of the paper is organized as follows. In Section 2, we present the 
mathematical formulation of the problem and introduce some key notions 
and characterizations. Section 3 presents the main asymptotic results and 
their connections to existing work. Section 4 addresses the important ques- 
tion of adaptive inference in the current status model: given a time-domain 
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and current status data observed at times on a regular grid of an unknown 
level of sparsity over the domain, how do we make inference on F? Section 5 
discusses the implementation of the procedure and presents results from sim- 
ulation studies, and Section 6 concludes with a discussion of the findings of 
this paper and their implications for monotone regression models in general, 
as well as more complex forms of interval censoring and interval censoring 
with competing risks. The Appendix contains some technical details. 

2. Formulation of the problem. Let {T^}^ be i.i.d. survival times 
following some unknown distribution F with Lebesgue density / concen- 
trated on the time-domain [a', b'] with < a' < b' < oo (or supported on 
[a',oo) if no such b' exists) and {Aj in } be i.i.d. observation times drawn 
from a discrete probability measure H n supported on a regular grid on 
[o, 6] with a! ^ a, <C b < b' . Also, T^ n and Xi n are assumed to be inde- 
pendent for each i. However, {Tj jn } are not observed; rather, we observe 
{Yi t n = l{?i,n < Xi )U }}. This puts us in the setting of a binary regres- 
sion model with Yi^ n \Xi^ n ~ Bernoulli(F(Xj ]n )). We denote the support of 
H n by {ti^n}f =l where the ith grid point ij >n = a + i5, the unit spacing 
5 = 5(n) = cn" 1 (also referred to as the grid resolution) with 7 6 (0, 1] and 
c > 0, and the number of grid points K = K(n) = [(b — a) /5\ . On this grid, 
the distribution H n is viewed as a discretization of an absolutely continu- 
ous distribution G, whose support contains [a, b] and whose Lebesgue den- 
sity is denoted as g. More specifically, H n {ti^ n } = 67(tj >n ) — G7(tj_i in ), for 
i = 2,3,...,K -1, H n {t l>n } = G(t 1>n ) and H n {t K)U } = 1 -G(t K ^ n ). For 
simplicity, these discrete probabilities are denoted as pi tTl = H n {ti tTl } for 
each i. In what follows, we refer to the pair (J^2,n5^,n) £is (_^Q,Y^), sup- 
pressing the dependence on n, but the triangular array nature of our ob- 
served data should be kept in mind. Similarly, the subscript n is suppressed 
elsewhere when no confusion will be caused. 

Our interest lies in estimating F at a grid-point. Since we allow the grid 
to change with n, this will be accomplished by specifying a grid-point with 
respect to a fixed time xq S (a, b) which does not depend on n and can be 
viewed as an "anchor-point." Define ti = t^ n to be the largest grid-point 
less than or equal to xq. We devote our interest to F(ti). More specifically, 
we are interested in the limit distribution of F(ti) — F(t[) under appropriate 
normalization? To this end, we start with the characterization of the NPMLE 
in this model. While this is well known from the current status literature, 
we include a description tailored for the setting of this paper. 

The likelihood function of the data {(Xi,Yi)} is given by 

n K 

L n (F) = HF{Xj) Y >(l - F{X 3 )) l - Y > P{l .. Xj=h} =J[F^{l - F^'-VS 
j=l i=i 
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where p^ . x=u} denotes the probability that Xj equals a genetic grid point U, 
Fi is an abbreviation for F(ti), Ni = ^j=i{^-j = ^} * s the number of obser- 
vations at ti, Zi = X^j=i^j{^j = is the sum of the responses at U, {•} 
stands for both a set and its indicator function with the meaning depending 
on the context and F is generically understood as either a distribution or the 
vector (Fi, F2, . . . , Fk), which sometimes is also written as {Fi}f =1 . Then, 
the log-likelihood function is given by 

K K 

l n {F) = \og{L n {F)) = ^2 N i lQ g^ + lo S^ + U " Ml " FiWih 

i=i i=i 

where = Zi/Ni is the average of the responses at tj. 
Denote the basic shape-restricted maximizer as 

{i?}*!= argmax/ n (F). 

Fi<...<F K 

From the theory of isotonic regression [see, e.g., Robertson, Wright and 
Dykstra (1988)], we have 

K 

argmax l n (F) = argmin y~][(Zj — Fj) 2 iVj]. 
F!<-<F K F!<-<F K ~[ 

Thus, {F*}f =1 is the weighted isotonic regression of {Zi}f =1 with weights 
{Ni}fL 1: and exists uniquely. We conventionally define the shape-restricted 
NPMLE of F on [o, b] as the following right-continuous step function: 

r 0, ifi€[a,ii); 

(2.1) F(t) = \F*, iite[ti,t i+1 ),i = l,...,K-l; 

[F£, i£t€[t K ,b]. 

Next, we provide a characterization of F as the slope of the greatest convex 
minorant (GCM) of a random processes, which proves useful for deriving 
the asymptotics for 7 G [1/3, 1]. Define, for t G [a, 6], 

(2.2) G n (t)=¥ n {x<t}, V n (t)=F n y{x<t}, 

where P n is the empirical probability measure based on the data {(Xi,Yi)}. 
Then, we have, for each x G [a, b], 

(2.3) F(x) = LS[GCM{(G n (t),V n (t)),te[a,b]}](G n (x)). 

In the above display, GCM means the greatest convex minorant of a set of 
points in M 2 . For any finite collection of points in R 2 , its GCM is a continuous 
piecewise linear convex function, and LS[-] denotes the left slope or derivative 
function of a convex function. The term GCM will also be used to refer to the 
greatest convex minorant of a real- valued function defined on a sub-interval 
of the real line. 
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Finally, we introduce a number of random processes that will appear in 
the asymptotic descriptions of F. 

For constants K\ > and k 2 > 0, denote 

(2.4) X K1)K2 (h) = K-iW{h) + K 2 h 2 for heR, 

where TV is a two-sided Brownian motion with VF(0) = 0. Let G KltK2 be the 
GCM of X KuK2 . Define, for he R, 

(2-5) g Kl , K2 (h) = LS[G KliK2 ](h). 

The process 5ki,k2 will characterize the asymptotic behavior of a localized 
NPMLE process in the vicinity of t\ for 7 > 1/3, from which the large sample 
distribution of F{t{) can be deduced. 

We also define a three parameter family of processes in discrete time which 
serve as discrete versions of the continuous-time processes above. For c, k\ , 
K2 > 0, let 

(k) = {V, 

(2.6) 

= {ck, KiW(ck) + K 2 c 2 k(l + k)} k& . 

Define 

(2.7) X CjKliK2 (d) = LS[GCM{V CtK1>K2 (k) : k G Z}](ci). 

This slope process will characterize the asymptotic behavior of the NPMLE 
in the case 7 = 1/3. 

3. Asymptotic results. In this section, we state and discuss results on 
the asymptotic behavior of F(ti) for 7 varying in (0,1]. In all that follows, 
we make the blanket assumption that F is once continuously differentiable 
in a neighborhood of xq. 

3.1. The case 7 < 1/3. We start with some technical assumptions: 

(Al.l) F has a bounded density / on [a, b], and there exists fi>0 such 
that f{x) > fi for every x G [a, b]. 

(Al.2) G has a bounded density g on [a, 6], and there exists gi>0 such 
that g(x) > g\ for every x G [a, 6]. 

(A1.3) a' <a and F(a) > 0. 

The above assumptions are referred to collectively as (Al). Letting t r denote 
the first grid-point to the right of tj, we have the following theorem. 

Theorem 3.1. //7G (0,1/3) and (Al) holds, 

(y/WtiFiU) - F(U)), y/Nr(F(tr) ~ *X*r))) 4 ^F(x )(l-F(x ))iV(0,/ 2 ), 
where I2 is the 2x2 identity matrix. 
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The proof of this theorem is provided in the supplement to this paper 
[Tang, Banerjee and Kosorok (2011)]. However, a number of remarks in 
connection with the above theorem are in order. 

Remark 3.2. From Theorem 3.1, the quantities F{t{) and F(t r ) with 
proper centering and scaling are asymptotically uncorrelated and indepen- 
dent. In fact, they are essentially the averages of the responses at the two 
grid points ti and t r and are therefore based on responses corresponding to 
different sets of individuals. Consequently, there is no dependence between 
them in the long run. Intuitively speaking, 7 £ (0, 1/3) corresponds to very 
sparse grids with successive grid points far enough so that the responses at 
different grid points fail to influence each other. 

It can be shown that for 7 £ (0, 1/3), Ni/(np{) converges to 1 in probability 
and that npi / ' cg{xo)n l ~' y converges to 1. Then the result of Theorem 3.1 can 
be rewritten as follows: 

(3.1) (n^y\F(ti) - F(^)),nM/ 2 (F(t r )-F(t r .)))4ac- 1 / 2 iV(0,/ 2 ), 

where a = y 1 'F(xq)(1 — F(xo))/g(xo). This formulation will be used later, 
and the parameter a will be seen to play a critical role in the asymptotic 
behavior of F(ti) when 7 £ [1/3, 1] as well. 

Remark 3.3. The proof of the above theorem relies heavily on the below 
proposition which deals with the vector of average responses at the the grid- 
points: {Zi}i =1 . Since Zi is not defined when N% = 0, to avoid ambiguity we 
set Zi = whenever this happens. This can be done without affecting the 
asymptotic results, since it can be shown that the probability of the event 
{Ni>0,i = l,2,...,K} goes to 1. 

Proposition 3.4. If*y€ (0,1/3) and (Al) holds, we have 
P(Z 1 <Z 2 <---<Z K )^1. 

This proposition is established in the supplement, Tang, Banerjee and 
Kosorok (2011). It says that with probability going to 1, the vector {Zi\ k i=x 
is ordered, and therefore the isotonization algorithm involved in finding the 
NPMLE of F yields {F*}f =1 = {Zi}f =1 with probability going to 1. In other 
words, asymptotically, isotonization has no effect, and the naive estimates 
obtained by averaging the responses at each grid point produce the NPMLE. 
This lemma is really at the heart of the asymptotic derivations for 7 < 1/3 
because it effectively reduces the problem of studying the F*'s, which are 
obtained through a complex nonlinear algorithm, to the study of the asymp- 
totics of the Zi, which are linear statistics and can be handled readily us- 
ing standard central limit theory. A phenomenon, similar to the one in the 
above proposition, was observed by Kiefer and Wolfowitz (1976) in connec- 
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tion with estimating the magnitude of the difference between the empirical 
distribution function and its least concave majorant for an i.i.d. sample 
from a concave distribution function. See Theorem 1 of their paper and 
the preceding Lemma 4, which establish the concavity of a piecewise linear 
estimate of the true distribution obtained by linearly interpolating the re- 
striction of the empirical distribution to a grid with spacings of order slightly 
larger than n -1 / 3 , n being the sample size. A similar result was obtained 
in Lemma 3.1 of Zhang, Kim and Woodroofe (2001) in connection with iso- 
tonic estimation of a decreasing density when the exact observations are not 
available; rather, the numbers of data-points that fall into equi-spaced bins 
are observed. 

3.2. The case 7 £ (1/3,1]. Our treatment will be condensed since the 
asymptotics for this case follow the same patterns as when the observation 
times possess a Lebesgue density. That this ought to be the case is sug- 
gested, for example, by Theorem 1 in Wright (1981); see, in particular, the 
condition on the rate of convergence of the empirical distribution function 
of the regressors to the true distribution function in the case that a = 1 
in that theorem, which corresponds to the setting 7 > 1/3 in our problem. 
Note that the a in the previous sentence refers to notation in Wright (1981) 
and should not be confused with the a defined in this paper. 

In order to study the asymptotics of the isotonic regression estimator F(ti), 
the following localized process will be of interest: for u £ I n = [(a — ti)n}' 3 , 
(b — i/)?! 1 / 3 ], define 

(3.2) X n (u) = n^ 3 (F(ti + un" 1/3 ) - F(U)). 
Next, define the following normalized processes on I n : 

(3.3) G*(h) = s(x r V/ 3 (G n (i, + hn" 1 / 3 ) - G n (t,)), 

V*(h) = <7(x r V/ 3 [K(^ + hn- 1 ' 3 ) - V n {t L ) 

(3.4) 

-FfaXGnfa+hn-V^-Gnfa))]. 

After some straightforward algebra, from (2.3) and (3.2), we have the fol- 
lowing technically useful characterization of X n : for u£ I n , 

(3.5) X n (u) = LS [GCM(G* (h),V*(h)),h € / n ](G* (u)). 

Let a be defined as Remark 3.2 and /3 = J(xq)/2. We have the following 
theorem on the distributional convergence of X n . 

Theorem 3.5 (Weak convergence of X n ). Suppose F and G are continu- 
ously differentiable in a neighborhood of xq with derivatives f andg. Assume 
that f(xo) > 0, g(xo) > and that g is Lipschitz continuous in a neighbor- 
hood ofxQ. Then, the finite- dimensional marginals of the process X n converge 
weakly to those of the process g a ,p- 
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Remark 3.6. Note that X n (0) = n 1 / 3 (F(t i ) - F(t t )). By Theorem 3.5, 
it converges in distribution to g a ,p($)- By the Brownian scaling results on 
page 1724 of Banerjee and Wellner (2001), for h£R, 

g*Ah) = (u 2 f3) 1/3 gi,i(W/a) 2 ^h). 

Then, by noting that 51,1(0) = 2Z, we have the following result: 
(3.6, »V°(F W - F M ) j (VWtoW ~ F(*))\» Zm 

V g( x o) ) 

Thus, the limit distribution of F(ti) is exactly the same as one would en- 
counter in the current status model with survival distribution F and the 
observation times drawn from a Lebesgue density function g. The proof of 
this theorem is omitted as it can be established via arguments similar to 
those in Banerjee (2007) using continuous mapping theorems for slopes of 
greatest convex minorants. 

3.3. The case 7 = 1/3. Now, we consider the most interesting boundary 
case 7 = 1/3. Let the localized process X n (tt) be defined exactly as in the 
previous subsection. The order of the grid-spacing 5 is now exactly n -1 / 3 , 
which is the order of localization around tj used to define the process X n , 
and it follows that X n has potential jumps only at ci for i£l n = (I n /c) H Z, 
and it suffices to consider X n on those ci's. For i Gl„, 

(3.7) X n (d) = n 1 / 3 (F(t / + cm" 1 / 3 ) - F(U)) 

(3.8) = LS [GCM{ (G* (ck) , V* (ck)), k e X n }) (G* (ci)). 

For simplicity of notation, in the remainder of this section, we will often 
write an integer interval as a usual interval with two integer endpoints. This 
will, however, not cause confusion since the interpretation of the interval 
will be immediate from the context. 

The following theorem gives the limit behavior of X n . 

Theorem 3.7 (Weak convergence of X n ). Under the same assumptions 
as in Theorem 3.5, for each nonnegative integer N , we have 

{X n (ci),ie[-N,N]}A{X C!a ^(ci),ie[-N,N}}. 

It follows that n l l z (F(ti) - F(t t )) A X CjQi/3 (0). 

Remark 3.8. It is interesting to note the change in the limiting behav- 
ior of the NPMLE with varying 7. As noted previously, for 7 £ (0, 1/3), the 
grid is sparse enough so that the naive average responses at each inspec- 
tion time, which provide empirical estimates of F at those corresponding 
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inspection times, are automatically ordered (and therefore the solution to 
the isotonic regression problem) and there is no "strength borrowed" from 
nearby inspection times. Consequently, a Gaussian limit is obtained. For 
7 > 1/3, the grid points are "close enough," so that the naive pointwise av- 
erages are no longer the best estimates of F. In fact, owing to the closeness 
of successive grid-points, the naive averages are no longer ordered, and the 
PAV pool adjacent violators algorithm (PAVA) leads to a nontrivial solution 
for the NPMLE which is a highly nonlinear functional of the data, putting 
us in the setting of nonregular asymptotics. It turns out that for 7 > 1/3, 
the order of the local neighborhoods of ti that determine the value of F(ti) 
is n -1 / 3 . When 7 = 1/3, the resolution of the grid matches the order of the 
local neighborhoods, leading in the limit to a process in discrete-time that 
depends on c. When 7 > 1/3, the number of grid-points in an n" 1 / 3 neigh- 
borhood of t\ goes to infinity. This eventually washes out the dependence 
on c and also produces, in the limit, a process in continuous time. 

For the rest of this section, we refer to the process X c a g simply as X and 
the process Fc,a,p as V c . 

Proof-sketch of Theorem 3.7. The key steps of the proof are as 
follows. Take an integer M > N. Then, the following two claims hold. 

Claim 1. There exist (integer- valued) random variables L n < —M and 
U n > M which are Op(l) and satisfy 

GCM{(G* n (ck),V:(ck)),k e [L n ,U n ]} 

= GCM{(G* (cfc) , V*(ck)) , k € Z} I [G* (cL n ) , G* n (cU n )} . 

Claim 2. There also exist (integer- valued) random variables L < — M 
and U > M such that L, U are Op(l) and that 

GCM{V c (k),k G [L, U]} = GCM{V c (k),k e Z}\[cL, cU}. 

For the proofs of these claims, see Tang, Banerjee and Kosorok (2011). 
We next need a key approximation lemma, which is a simple extension of 
Lemma 4.2 in Prakasa Rao (1969). 

Lemma 3.9. Suppose that for each e > 0, {W n } and {W £ } are 

sequences of random vectors, W is a random vector and that: 

(1) lim £ _>o linin^oo 

HWne + W n ) = 0, 

(2) lim £ ^ ¥(W E ^W) = 0, 

(3) W ne — > W e , as n — > 00 for each e > 0. 

Then W n -4 W, as n — > 00. 
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From Claims 1 and 2, for every (small) e > 0, there exists an integer M £ 
large enough such that 

P(M £ > nmx{\L n \,U n ,\L\,U}) > 1 - e. 

TV, TV], 

= LS[GCM{(G* (cA:), V^ck)), k G [±M e ]}](G£ («)), 
:LS[GCM{P C (A;),A:G [±M e ]}](ci). 
-TV, TV] and 

= {{X^(ci),i G [±iV]} / {X n (d),i G [±jV]}}, 
= {{X Me (ri),ie [±iV]}^{X(ci),ie [±iV]}}. 
Then, the following three facts hold: 

Fact 1. lim e _^ o nm n - foo F(^ n ) = 0. 

Fact 2. lim £ ^. P(^) = 0. 

Fact 3 . {X^ (ci),i G [±N] } A {X M * (ci) , i G [±iV] } , as n -> oo for each 
e>0. 

Facts 1 and 2 follow since A n and ^4 are subsets of {M e < max{|L n |, U n , 
\L\, U}}, whose probability is less than e, Facts 1 and 2 hold. Fact 3 is proved 
in Tang, Banerjee and Kosorok (2011). A direct application of Lemma 3.9 
then leads to the weak convergence that we sought to prove. □ 

Remark 3.10. The proofs of Claims 1 and 2 consist of technically im- 
portant localization arguments. Claim 1 ensures that eventually, with ar- 
bitrarily high pre-specified probability, the restriction of the greatest con- 
vex minorant of the process (G^, V*) (which is involved in the construction 
of X n ) to a bounded domain can be made equal to the greatest convex mi- 
norant of the restriction of (G^,V*) to that domain, provided the domain 
is chosen appropriately large, depending on the pre-specified probability. It 
can be proved by using techniques similar to those in Section 6 of Kim and 
Pollard (1990). Claim 2 ensures that an analogous phenomenon holds for 
the greatest convex minorant of the process V c , which is involved in the 
construction of X. These equalities then translate to the left-derivatives of 
the GCMs involved, and the proof is completed by invoking a continuous 
mapping theorem for the GCMs of the restriction of (G^,V*) to bounded 
domains, along with Claims 1 and 2, which enable the use of the approxi- 
mation lemma adapted from Prakasa Rao (1969). 



Denote, for i G [— 
X^(ci) = 
X A/e (d) = 
Denote [±N] = [- 
A n = 
A = 
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The basic strategy of the above proof has been invoked time and again 
in the literature on monotone function estimation. Prakasa Rao (1969) em- 
ployed this technique to determine the limit distribution of the Grenander 
estimator at a point, and Brunk (1970) for studying monotone regression. 
Leurgans (1982) extended these techniques to more general settings which 
cover weakly dependent data while Anevski and Hossjer (2006) provided 
a comprehensive and unified treatment of asymptotic inference under or- 
der restrictions, applicable to independent as well as short and long range 
dependent data. This technique was also used in Banerjee (2007) to study 
the asymptotic distributions of a very general class of monotone response 
models. It ought to be possible to bring the general techniques of Anevski 
and Hossjer (2006) to bear upon the boundary case, but we have not in- 
vestigated that option; our proof-strategy is most closely aligned with the 
proof of Theorem 2.1 in Banerjee (2007). 

3.4. A brief discussion of the boundary phenomenon. We refer to the 
behavior of the NPMLE for 7 = 1/3 as the boundary phenomenon. As indi- 
cated in the Introduction, the asymptotic distribution for 7 = 1/3 is different 
from both the Gaussian (which comes into play for 7 < 1/3) and the Chernoff 
(which arises for 7 > 1/3). This boundary distribution, which depends on 
the scale parameter, c, can be viewed as an intermediate between the Gaus- 
sian and Chernoff, and its degree of proximity to one or the other is dictated 
by c as we demonstrate in the following section. More importantly, this tran- 
sition from one distribution to another via the boundary one, has important 
ramifications for inference in our grid-based problem as also demonstrated 
in the next section. 

The closest result to our boundary phenomenon in the literature appears 
in the work of Zhang, Kim and Woodroofe (2001) who study the asymp- 
totics of isotonic estimation of a decreasing density with histogram-type 
data. Thus, the domain of the density is split into a number of pre-specified 
bins, and the statistician knows the number of i.i.d. observations from the 
density that fall into each bin (with a total of n such observations) . The rate 
at which the number of bins increases relative to n then drives the asymp- 
totics of the NPMLE of the density within the class of decreasing piecewise 
linear densities, with a distribution similar to X(0) appearing when this num- 
ber increases at rate n 1 / 3 . However, unlike us, Zhang, Kim and Woodroofe 
(2001) do not establish any connections among the different limiting regimes; 
neither do they offer a prescription for inference when the rate of growth of 
the bins is unknown as is usually the case in practice. 

It is worthwhile contrasting our boundary phenomenon with those ob- 
served by some other authors. Anevski and Hossjer (2006) discover a "bound- 
ary effect" in their Theorems 5 and 6.1 when dealing with an isotonized 
version of a kernel estimate (see Section 3.3 of their paper). In the setting of 
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i.i.d. data, when the smoothing bandwidth is chosen to be of order n -1 / 3 , the 
asymptotics of the isotonized kernel estimator are given by the minimizer of 
a Gaussian process {depending on the kernel) with continuous sample paths 
plus a quadratic drift, whereas for bandwidths of larger orders than n" 1 / 3 
normal distributions obtain. A similar phenomenon, in the setting of mono- 
tone density estimation, was observed by van der Vaart and van der Laan 
(2003) in their Theorem 2.2 for an isotonized kernel estimate of a decreasing 
density while using an n -1 / 3 order bandwidth. Note that these boundary 
effects are quite different from our boundary phenomenon. In Anevski and 
Hossjer's setting, for example, the underlying regression model is observed 
on the grid {i/n}, with one response per grid-point. Kernel estimation with 
an n -1 / 3 bandwidth smooths the responses over time-neighborhoods of or- 
der re -1 / 3 producing a continuous estimator which is then subjected to iso- 
tonization. This leads to a limit that is characterized in terms of a process in 
continuous time. In our setting, our data are not necessarily observed on an 
{i/n} grid; our grids can be much sparser and for the case 7 = 1/3, multiple 
responses are available at each grid-point. The NPMLE isotonizes the Zj's; 
thus, isotonization is preceded by averaging the multiple responses at each 
time cross-section, but there is no averaging of responses across time, in 
sharp contrast to Anevski and Hossjer's setting. This, in conjunction with 
the already noted fact at the beginning of this subsection that the grid- 
resolution when 7 = 1/3 has the same order as the localization involved in 
constructing the process X n , leads in our case to a limit distribution for the 
NPMLE that is characterized as a functional of a process in discrete time. 

4. Adaptive inference for F at a point. In this section, we develop a pro- 
cedure for constructing asymptotic confidence intervals for F{t{) which does 
not require knowing or estimating the underlying grid resolution controlled 
by the parameters 7 and c. This provides massive advantage from an inferen- 
tial perspective because the parameter 7 critically drives the limit distribu- 
tion of the NPMLE and mis-specification of 7 may result in asymptotically 
incorrect confidence sets, either due to the use of the wrong limit distribution 
or due to an incorrect convergence rate, or both. 

To this end, we first investigate the relationships among the three different 
asymptotic limits for F(t[) that were derived in the previous section, for 
different values of 7. In what follows, we denote X CjCt]i g(0) by S c , suppressing 
the dependence on a,j3 for notational convenience. The use of the letter S 
is to emphasize the characterization of this random variable as the slope of 
a stochastic process. 

Our first result relates the distribution of S c to the Gaussian. 

Theorem 4.1. As c— >co, y/cS c -A aZ, where Z follows the standard 
normal distribution. 
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CDFs of Scaled Sc Converge to Gaussian CDFs of Scaled Sc Converge to Chernoff 




-2-10 1 2-2-10 1 



Fig. 1. The left and right panels show that a sequence of empirical CDFs of the properly 
scaled S c converge to the standard Gaussian and Chernoff distributions, respectively. In 
the left panel, the empirical CDFs with c > 3 almost coincide with the standard Gaussian 
distribution. 

Our next result investigates the case where c goes to 0. 
Theorem 4.2. Asc^O, S C A g a ^{0) = 2(a 2 p) 1 '* Z. 

Remark 4.3. Theorem 4.2 is somewhat easier to visualize heuristically, 
compared to Theorem 4.1. Recall that S c is the left-slope of the GCM of the 
process V c at the point 0, the process itself being defined on the grid cZ. As c 
goes to 0, the grid becomes finer, and the process V c is eventually substituted 
by its limiting version, namely X a g. Thus, in the limit, S c becomes g a g(0), 
the left-slope of the GCM of X at g at 0. The representation of this limit in 
terms of Z was established in Remark 3.6 following Theorem 3.5. 

The results of Theorems 4.1 and 4.2 are illustrated next. Suppose the time 
interval [a, b] is [0,2], xq = 1 and that F and G are both the uniform distribu- 
tion on [0,2]. Under these settings, the values of a and j3 are \/2/4 and 1/4, 
respectively. We generate i.i.d. random samples of S c with c being 1, 2, 3, 5 
and 10 and the common sample size being 5000. The left panel of Figure 1 
compares the empirical cumulative distribution functions (CDF) of yfcS c /a 
and the standard Gaussian distribution iV(0, 1). It shows clearly that the 
empirical CDFs move closer to the Gaussian distribution with increasing c 
and that the empirical CDF of ^fcS c ja with c equal to 3 has already pro- 
vided a decent approximation to iV(0, 1). On the other hand, the right panel 
of Figure 1 compares the empirical CDFs of (l/2)(ct 2 /3) _1 / 3 5 c and the stan- 
dard Chernoff distribution Z. Again, the empirical CDFs approach that of Z 
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with diminishing c, with c = 1 providing a close approximation for Z. Note 
that, while the convergence in this setting is relatively quick in the sense 
that the limiting phenomena manifest themselves at moderate values of c 
(i.e., neither too large, nor too small), this may not necessarily be the case 
for other combinations of (a,/3), and more extreme values may be required 
for good enough approximations. 

The adaptive inference scheme: We are now in a position to propose our 
inference scheme. We focus on the so-called "Wald-type" intervals for F(ti), 
that is, intervals of the form F(ti) plus and minus terms depending on the 
sample size and the large sample distribution of the estimator. Let cq and 70 
denote the true unknown values of c and 7 in the current status model. With 
K = K n being the number of grid-points, we have the relation 

K n =[(b-a)/(c n^°)\. 

Now pretend that the true 7 is exactly equal to 1/3. Calculate a surrogate c, 
say c, via the relation 

[(b-a)/(cn^ 3 )\=K n . 

Some algebra shows that 

c = c n = en 1 / 3 " 70 + C^n 1 / 3 " 270 ) = cn 1 / 3 " 70 ^ + 0{n~^)). 

Thus, the calculated parameter c actually depends on n, and goes to 00 and 
for 70 £ (0, 1/3) and 70 £ (1/3, 1], respectively. 

We propose to use the distribution of as an approximation to the 
distribution of n 1 ^ 3 (F(^) — F(ti)). Thus, an adaptive approximate 1 — 77 
confidence interval for F(ti) is given by 

(4.1) [F(t t ) - n^ 3 q(S £ , 1 - r)/2),F(ti) - n" 1 ^^, ( v /2))\, 

where 77 > and q(X,p) stands for the lower pth quantile of a random 
variable X with p £ (0, 1). 

Asymptotic validity of the proposed inference scheme: The above adaptive 
confidence interval provides the correct asymptotic calibration, irrespective 
of the true value 0/7. If 70 happens to be 1/3, then, of course, the adaptive 
confidence interval is constructed with the correct asymptotic result. If not, 
consider first the case that 70 £ (1/3, 1]. If we knew that 70 £ (1/3, 1], then, 
by result (3.6) and the symmetry of 5 Ql/ g(0), the true confidence interval 
would be 

(4-2) [F(ti) ± n- 1/3 <zGfe,/3(0), (1 - rj/2))]. 

Now recall that c goes to since 70 £ (1/3,1]. Thus, by Theorem 4.2, 
the quantile sequence q(Se,p) converges to q(g a ,p(0),p), owing to the fact 
that <7 Qi| g(0) is a continuous random variable. So, the adaptive confidence 
interval (4.1) converges to the true one (4.2) obtained when 70 is in (1/3, 1]. 
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That the adaptive procedure also works when 70 £ (0, 1/3) will be shown 
by using Theorem 4.1. Again, suppose we know the value of 70. Then, from 
result (3.1) and the symmetry of the standard normal random variable Z, 
the confidence interval is given by 

(4.3) [F{ti) ± n~^-^/ 2 ac- 1/2 q(Z, (1 - V /2))}. 

To show that the adaptive procedure is, again, asymptotically correct, it 
suffices to show that for every p 6 (0, 1), as n — > 00, 

n -( 1 -To)/2 ac -l/2g( Z)P ) n -(l- 70 )/2gl/2 " og(Z,p) ' ^ ' 

Recall that c goes to 00 since 70 6 (0, 1/3). By Theorem 4.1, we have // — > 1 
as n — > 00. On the other hand, we can see I simplifies to (1 + 0(n -70 )) -1 / 2 
and therefore goes to 1. Thus, the adaptive confidence interval (4.1) also 
converges to the true one (4.3) obtained when 70 is known to be in (0, 1/3). 

Thus, our procedure adjusts automatically to the inherent rate of growth 
of the number of distinct observation times and that is an extremely desirable 
property. 

We next articulate some practical issues with the adaptive procedure. 
First, note that 5g = Xc iQi( g(0), and in practice a and /3 are unknown, and 
therefore need to be estimated consistently. We provide simple methods for 
consistent estimation of these two parameters in the next section. Second, 
the random variable Xg 0,3(0) does not appear to admit a natural scaling 
in terms of some canonical random variable: in other words, it cannot be 
represented as C(c,a,f3)J where C is an explicit function of c,a,f3 and J is 
some fixed well-characterized random variable. Thus, the quantiles of X. & 5 

(where a and f3 are consistent estimates for the corresponding parameters) 
need to be calculated by generating many sample paths from the parent 
process V* & a and computing the left slope of the convex minorant of each 
such path at 0. This is, however, not a terribly major issue in these days 
of fast computing, and, in our opinion, the mileage obtained in terms of 
adaptivity more than compensates for the lack of scaling. Finally, one may 
wonder if resampling the NPMLE would allow adaptation with respect to 7. 
The problem, however, lies in the fact that while the usual n out of n boot- 
strap works for the NPMLE when 7 E (0, 1/3), it fails under the nonstan- 
dard asymptotic regimes that operate for 7 G [1/3,1], as is clear from the 
work of Abrevaya and Huang (2005), Kosorok (2008) and Sen, Banerjee and 
Woodroofe (2010). Since 7 is unknown, it is impossible to decide whether 
to use the standard n out of n bootstrap. One could argue that the m out 
of n bootstrap or subsampling will work irrespective of the value of 7, but 
the problem that arises here is that these procedures require knowledge of 
the convergence rate and this is unknown as it depends on the true value 
of 7. 
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5. A practical procedure and simulations. In this section, we provide 
a practical version of the adaptive procedure introduced in Section 4 to con- 
struct Wald-type confidence intervals for F{t{) and assess their performance 
through simulation studies. The true values of c and 7 are denoted by cq 
and 7o- The process V c ,a,j3 i s again abbreviated to V c . 

Recall that in the adaptive procedure, we always specify 7 = 1/3 and 
compute a surrogate for cq, namely c, as a solution of the equation K = 
\_(b — a) /cn -1 / 3 J , where K is the number of grid points. To construct a level 
I — 277 confidence interval for F(t{) for a small positive 77, quantiles of 
are needed. Since S c = LS[GCM{"P C (A;), k £ Z}](0) (c is genetically used), we 
approximate S c with 

X C;Ka (0)=LS[GCM{T c (k),ke[-K a -l,K a }}}(0) 

for some large K a € N. Further, since 

X C:K M=^[GCM{(V 1 , c (k)/c,V 2 A k )/c),k£[-K a -l,K a }}](0), 

where V\ fC {k)/c = k and V2, c (k)/c = aW(ck)/c + f3ck(l + k), we get that 
^c,K a (0) is the isotonic regression at k = of the data 

{(k,V2, c {k)/c-V 2 ,c(k-l)/c),ke[-K a ,K a )} 

= {(k,aZ k /V^ + 2(3ck),ke[-K a ,K a ]}, 

where {Zk} k =-K a are fr° m N(0, 1), a = sJF(xq){1 — F{xq)) / g(xo) and 
(3 = f(xo)/2. To make this adaptive procedure practical, we next consider 
the estimation of a and /?, or equivalently, the estimation of F(xq), g(xo) 
and f(x ). 

First, we consider the estimation of F(xq) and g{xo). Although ^(^0) can 
be consistently estimated by F(ti), in our simulations we estimate F(xo) by 
pF{ti) + (1— p)F(t r ) with p = (xo — ti)/(t r — ti) G [0, 1). To estimate g(xo), we 
use the following estimating equation: (JV/_j*+i + • • • + N r+ j*)/n = 
g(xo)(t r+ j* —ti-j*), where j* is defined below in the estimation of f(xo). 
Since the design density g is assumed to be continuous in a neighborhood 
of xq, and the interval [ti—j*, is shrinking to xq, it is reasonable to 
approximate g over the interval \ti—j*,t r +j*] with a constant function. Thus, 
from the above estimating equation, one simple but consistent estimator 
of g(x ) is given by g(x ) = (JVj__,-* +1 H h N r+j *)/[n(t r+j * - ti-j*)]. 

Next, we consider the estimation of /(xq). To this end, we estimate f(U) 
using a local linear approximation: identify a small interval around tj, and 
then approximate F over this interval by a line, whose slope gives the estima- 
tor of f(ti). We determine the interval by the following several requirements. 
First, the sample proportion p n in the interval should be larger than the sam- 
ple proportion at each grid point, which is of order n~ 7 for 7 S (0, 1]. For ex- 
ample, setting p n be of order 1 / log n theoretically ensures a sufficiently large 
interval. Second, for simplicity, we make the interval symmetric around t[. 
Third, in order to obtain a positive estimate [since f{t{) is positive], we sym- 
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metrically enlarge the interval satisfying the above two requirements until 
the values of F at the two ends of the interval become different. Thus, we 
first find j* , the smallest integer such that Y^i=i—j*Ni/n > 1/logn. Next, 

we find i*, the smallest integer larger than j* such that -F(t/_j*) < F{ti + i*) 
and employ a linear approximation over [fy_j*, More specifically, we 

compute 

r i+i* 

0oJi)= argmaxl V (F(U) - O - PiUfNi 
(/3 ,/3i)elR 2 [ i=l _i* 

and estimate f(U) [and f(xo)] by j3\. Once these nuisance parameters have 
been estimated, the practical adaptive procedure can be implemented. 

The above procedures provide consistent estimates of g(xo) and /(xo) 
under the assumption of a single derivative for F and G in a neighborhood 
of xq, irrespective of the value of 7 [since the estimates are obtained by local 
polynomial fitting over a neighborhood of logarithmic order (in n) around xq 
and such neighborhoods are guaranteed to be asymptotically wider than n~ 7 
for any < 7 < 1]. Two points need to be noted. First, the 1/logn threshold 
used to determine j* in the previous paragraph may need to be changed to 
a multiple of 1/logn, depending on the sample size and the length of the 
time interval. Second, the locally constant estimate of g{x$) discussed above 
could be replaced by a local linear (or quadratic) estimate of g, if the data 
strongly indicate that G is changing sharply in a neighborhood of xq. 

To evaluate the finite sample performance of the practical adaptive proce- 
dure, we also provide simulated confidence intervals of an idealized (theoreti- 
cal) adaptive procedure where the true values of the parameters F(xo),g(xo) 
and f(xo) are used, but 7 is still practically assumed to be 1/3, and c is 
taken as the previous c. These confidence intervals can be considered as the 
best Wald-type confidence intervals based on the adaptive procedure. 

The simulation settings are as follows: The sampling interval [a, b] is [0, 1]. 
The design density g is uniform on [a, b]. The distribution of T is the uni- 
form distribution over [a, b] or the exponential distribution with A = 1 or 2. 
The anchor-point xq is 0.5. The pair of grid-parameters (7, c) takes values 
(1/6,1/6), (1/4,1/4), (1/3,1/2), (1/2,1), (2/3,2) and (3/4,3). The sam- 
ple size n ranges from 100 to 1000 by 100. When generating the quantiles 
of Xg(0), K a is set to be 300 and the corresponding iteration number 3000. 
We are interested in constructing 95% confidence intervals for F(ti). The 
iteration number for each simulation is 3000. 

Denote the simulated coverage rates and average lengths for the practical 
procedure as CR(P) and AL(P) and those for the theoretical procedure as 
CR(T) and AL(T). Figure 2 contains the plots of CR(P), CR(T), AL(P) 
and AL(T), and Table 1 contains the corresponding numerical values for 
n = 100,300,500. The first panel of Figure 2 shows that both CR(T) and 
CR(P) are usually close to the nominal level 95% from below and that 
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Fig. 2. A comparison of the coverage rates and average lengths of the practical and 
theoretical procedures, where (ri,ci) fori = l,...,6 are (1/6,1/6), (1/4,1/4), (1/3,1/2), 
(1/2,1), (2/3,2) or (3/4,3), respectively. The sample size n varies from 100 to 1000 by 
100. 
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Table 1 

A comparison of the coverage rates and average lengths of the practical procedure with 
those of the theoretical procedure, where U[0, 1] and exp(A) stand for the uniform 
distribution over [0, 1], and the exponential distributions with the parameter A, and n±,n2 
and ns are 100, 300 and 500, respectively 
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(1/6,1/6) 0.940 0.947 0.953 0.940 0.949 0.946 0.931 0.941 

(1/4,1/4) 0.929 0.947 0.949 0.938 0.945 0.946 0.932 0.949 

(1/3,1/2) 0.943 0.940 0.948 0.941 0.951 0.946 0.928 0.939 

(1/2,1) 0.940 0.949 0.946 0.941 0.944 0.950 0.939 0.945 

(2/3,2) 0.946 0.950 0.941 0.941 0.951 0.947 0.935 0.957 

(3/4,3) 0.939 0.953 0.947 0.945 0.948 0.944 0.930 0.950 



Average lengths 



AL(P) 




U[0,1] 






exp(l) 






exp(2) 




(7 ? c) 


ni 


n 2 


n 3 


m 


ri2 


n 3 


ni 


ri2 


n 3 


(1/6,1/6) 


0.417 


0.286 


0.239 


0.358 


0.246 


0.206 


0.380 


0.261 


0.216 


(1/4,1/4) 


0.415 


0.287 


0.240 


0.356 


0.242 


0.204 


0.376 


0.258 


0.218 


(1/3,1/2) 


0.409 


0.281 


0.236 


0.359 


0.243 


0.207 


0.381 


0.258 


0.219 


(1/2,1) 


0.411 


0.287 


0.241 


0.350 


0.243 


0.201 


0.370 


0.258 


0.215 


(2/3,2) 


0.411 


0.286 


0.241 


0.354 


0.239 


0.202 


0.379 


0.253 


0.216 


(3/4,3) 


0.414 


0.287 


0.241 


0.352 


0.239 


0.202 


0.376 


0.250 


0.214 


AL(T) 




U[0,1] 






exp(l) 






exp(2) 




(1/6,1/6) 


0.426 


0.294 


0.247 


0.357 


0.247 


0.208 


0.377 


0.260 


0.219 


(1/4,1/4) 


0.426 


0.295 


0.248 


0.357 


0.247 


0.208 


0.377 


0.261 


0.220 


(1/3,1/2) 


0.422 


0.292 


0.246 


0.355 


0.246 


0.208 


0.374 


0.260 


0.219 


(1/2,1) 


0.424 


0.295 


0.249 


0.356 


0.247 


0.209 


0.375 


0.261 


0.220 


(2/3,2) 


0.424 


0.297 


0.251 


0.356 


0.248 


0.209 


0.375 


0.262 


0.221 


(3/4,3) 


0.424 


0.297 


0.251 


0.356 


0.248 


0.209 


0.375 


0.262 


0.221 



0.946 
0.943 
0.936 
0.950 
0.943 
0.946 



CR(T) is generally about 1% better than CR(P). This reflects the price of 
not knowing the true values of the parameters F(xo), g(xo) and /(xo) in the 
practical procedure. On the other hand, the second panel of Figure 2 shows 
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that the AL(P)s are usually slightly shorter than AL(T)s. This indicates 
that the practical procedure is slightly more aggressive. As the sample size 
increases, the coverage rates usually approach the nominal level, and the 
average lengths also become shorter, as expected. 

The patterns noted above show up in more extensive simulation studies, 
not shown here owing to constraints of space. Also, the adaptive procedure 
is seen to compete well with the asymptotic approximations that one would 
use for constructing CIs were 7 known. 

We end this section by pointing out that while, for the simulations, we 
knew the anchor-point xq (ti being the largest grid-point to the left of or 
equal to Xq), and that we did make use of its value for estimating F(xq) 
in our simulations, knowledge of xq is not essential to the inference proce- 
dure. We could have just estimated F{xq) by F{t{) [rather than by a convex 
combination of F(t\) and F(t r ) that depends upon xq] consistently. This is 
a critical observation, since in a real-life situation what we are provided is 
current status data on a grid with particular grid points of interest. There 
is no specification of x$. To make inference on the value of F at such a grid- 
point, one can, conceptually, view xo as being any point strictly in between 
the given point and the grid-point immediately after, but its value is not 
required to construct a confidence interval by the adaptive method. To reit- 
erate, the "anchor-point," xq was introduced for developing our theoretical 
results, but its value can be ignored for the implementation of our method 
in practice. 

6. Concluding discussion. In this paper, we considered maximum likeli- 
hood estimation for the event time distribution function, F, at a grid point 
in the current status model with i.i.d. data and observation times lying on 
a regular grid. The spacing of the grid 5 was specified as cn~ 7 for constants 
c > and < 7 < 1 in order to incorporate situations where there are sys- 
tematic ties in observation times, and the number of distinct observation 
times can increase with the sample size. The asymptotic properties of the 
NPMLE were shown to depend on the order of the grid resolution 7 and 
an adaptive procedure, which circumvents the estimation of the unknown 7 
and c, was proposed for the construction of asymptotically correct confi- 
dence intervals for the value of F at a grid-point of interest. We conclude 
with a description of alternative methods for inference in this problem and 
potential directions for future research. 

Likelihood ratio based inference: An alternative to the Wald-type adaptive 
confidence intervals proposed in this paper would be to use those obtained 
via likelihood ratio inversion. More specifically, one could consider testing 
the null hypothesis Hq that F(t[) = Q\ versus its complement using the like- 
lihood ratio statistics (LRS). When the null hypothesis is true, the LRS 
converges weakly to %\ hi the limit for 7 < 1/3, to B, the parameter- free 
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limit discovered by Banerjee and Wellner (2001) for 7 > 1/3 and a discrete 
analog of B depending on c,a,/3, say M c ,a,p, that can be written in terms 
of slopes of unconstrained and appropriately constrained convex minorants 
of the process V ca r for 7 = 1/3. Thus, one obtains a boundary distribution 
for the likelihood ratio statistic as well, and a phenomenon similar to that 
observed in Section 4 transpires, with the boundary distribution converging 
to X? as c — )• 00 and to that of B as c — > 0. An adaptive procedure, which 
performs an inversion by calibrating the likelihood ratio statistics for test- 
ing a family of null hypotheses of the form F(ti) = 9 for varying 8, using 
the quantiles of M £ & z, can also be developed but is computationally more 
burdensome than the Wald-type intervals. See Tang, Banerjee and Kosorok 
(2010) for the details. 

Smoothed estimators: We recall that all our results have been developed 
under minimal smoothness assumptions on F: throughout the paper, we 
assume F to be once continuously differentiable with a nonvanishing deriva- 
tive around xq. We used the NPMLE to make inference on F since it can be 
computed without specifying bandwidths; furthermore, under our minimal 
assumptions, its pointwise rate of convergence when 7 > 1/3 or when the 
observation times arise from a continuous distribution cannot be bettered 
by a smoothed estimator. However, if one makes the assumption of a sec- 
ond derivative at xq, the kernel-smoothed NPMLE (and related variants) 
can achieve a convergence rate of n 2//5 (which is faster than the rate of the 
NPMLE) using a bandwidth of order ra -1 / 5 . See Groeneboom, Jongbloed and 
Witte (2010) where these results are developed and also an earlier paper due 
to Mammen (1991) dealing with monotone regression. In such a situation, 
one could envisage using a smoothed version of the NPMLE in this problem 
with a bandwidth larger than the resolution of the grid, and it is conceivable 
that an adaptive procedure could be developed along these lines. While this 
is certainly an interesting and important topic for further exploration, it is 
outside the scope of this work, not least owing to the fact that the assump- 
tions underlying such a procedure are different (two derivatives as opposed 
to one) than those in this paper. 

Further possibilities: The results in this paper reveal some new directions 
for future research. As touched upon in the Introduction, some recent re- 
lated work by Maathuis and Hudgens (2011) deals with the estimation of 
competing risks current status data under finitely many risks with finitely 
many discrete (or grouped) observation times. A natural question of in- 
terest, then, is what happens if the observation times in their paper are 
supported on grids of increasing size as considered in this paper for simple 
current status data. We suspect that a similar adaptive procedure relying 
on a boundary phenomenon at 7 = 1/3 can also be developed in this case. 
Furthermore, one could consider the problem of grouped current status data 
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(with and without the element of competing risks), where the observation 
times are not exactly known but grouped into bins. Based on communica- 
tions with us and preliminary versions of this paper, Maathuis and Hudgens 
(2011) conjecture that for grouped current status data without competing 
risks, one may expect findings similar to those in this paper, depending on 
whether the number of groups increases at rate ra 1 / 3 or at a faster/slower 
rate and it would not be unreasonable to expect a similar thing to happen 
for grouped current status data with finitely many competing risks. In fact, 
an adaptive inference procedure very similar to that in this paper should 
also work for the problem treated in Zhang, Kim and Woodroofe (2001) 
and allow inference for the decreasing density of interest without needing to 
know the rate of growth of the bins. 

It is also fairly clear that the adaptive inference scheme proposed in this 
paper will apply to monotone regression models with discrete covariates in 
general. In particular, the very general conditionally parametric response 
models studied in Banerjee (2007) under the assumption of a continuous 
covariate can be handled for the discrete covariate case as well by adapting 
the methods of this paper. Furthermore, similar adaptive inference in more 
complex forms of interval censoring, like Case-2 censoring or mixed-case 
censoring [see, e.g., Sen and Banerjee (2007) and Schick and Yu (2000)], 
should also be possible in situations where the multiple observation times 
are discrete- valued. Finally, we conjecture that phenomena similar to those 
revealed in this paper will appear in nonparametric regression problems 
with grid-supported covariates under more complex shape constraints (like 
convexity, e.g.), though the boundary value of 7 as well as the nature of the 
nonstandard limits will be different and will depend on the "order" of the 
shape constraint. This will also be a topic of future research. 



Proof of Theorem 4.1. For fceZ, let 

h(k) = a^b~W{ck) + /3c 5/2 fc(l + k), h(k) = acW(k) + p C 5/2 k{l + jfc). 
Then, we have {h(k),ke Z} = {h(k),ke Z}. Thus, 



APPENDIX: PROOFS 



y/cS c = LSo GCM{(cA;, h(k)),k G Z}(0). 



Define S c = ^fcS c . Denote 





M-(fc-i)) c H-k) 

c(k — 1) ck 
h(l) ^ -h(-l)\ 
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Then, for uj € A C B C C C , it is easy to see S c = —aW(—l). We will show 
in Lemma A.l, A C B C C C 4- 1. Thus, S c = S C A C B C C C + <S C (1 - A C B C C C ) -4 
-aW(-l) = aZ, with Z~ A(0, 1). Therefore, y/cS c -4 aZ. □ 

Lemma A.l. Each of A c , B c and C c in the proof of Theorem 4-1 con- 
verges to 1 in probability. 

Proof. It is easy to show C c converges to 1 in probability. The argument 
that A c converges to one in probability is similar to that for B c , and only 
the former is established here. In order to show P(A C ) — >• 1, it suffices to 
show P(Ae) — > 0. We have, for each k € Z, 

p (h(k) > h(k + l) 
V ck ~ c(k + 1) 

-pfazm. + + 1) > ^±^1 + l)c m ik + 2) ) 

\ k k + 1 / 

W(k) W{k + l) 



P a 



> /3c 3 / 2 

k k + l 

= P(N(0, 1) > a" 1 p^ /2 ^Jk(k + V)) 

< 2" 1 exp{-2- 1 a- 2 /? 2 c 3 fc(A; + 1)} 

using the fact that W(k) /k - W{k + 1) /{k + 1) ~ JV(0, (k(k + l))" 1 ) and the 
inequality P(N(0,1) > x) < 2 _1 exp{(-2 _1 x 2 )} for x > [see, e.g., (2) on 
page 317 of Pollard (2002)]. Then, we have 

POO 

<2~ l / exp{-2- 1 a- 2 /3 2 c 3 x 2 }dx = (v / 2^r/4)a/3- 1 C - 3/2 ^0 
Jo 

as c — > oo. Thus, P(A C ) — > 1, which completes the proof. □ 

Proof of Theorem 4.2. We want to show that S c -4 g a ,p(0), as c — > 0, 
where g a ,p(0) = LS o GCM{A Qj/3 }(0) = LSoGCM{X a> p(t):i € M}(0) and 
S c = LSoGCM{P c }(0) = LSoGCM{P c (fc):fc€Z}(0). Since S c = S' c + /3c, 
where S' c = LSoGCM{7^:Jfe € Z}(0) and 7^ = {(<*, aW(c£;) + (3{ck) 2 ): 

fcGZ}, it is sufficient to show S' c g a a(0) as c — >• 0. To make the notation 
simple and without causing confusion, in the following we still use V c and S c 
to denote V' c and S' c . Also, it will be useful to think of V c as a continuous pro- 
cess on R formed by linearly interpolating the points {ck ) T J 2,c{ck) :k £ Z}, 
where V2,c(ck) = aW(ck) + (3(ck) 2 = X a ^(ck). Note that viewing "P c in 
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t, L c T_ 2 Xj ^ T =0 X] U c t 2 T 2 

Fig. 3. An illustration for showing {L c } is Op(l) in the proof of Theorem 4-8- 



this way keeps the GCM unaltered, that is, the GCM of this continu- 
ous linear interpolated version is the same as that of the set of points 
{ck,V2, c (ck) : k E Z}, and the slope-changing points of this piece-wise lin- 
ear GCM are still grid-points of the form ck. 

Let L and U be the largest negative and smallest nonnegative x-axis 
coordinates of the slope changing points of the GCM of X a ^. Similarly, 
let L c and U c be the largest negative and smallest nonnegative x-axis coor- 
dinates of the slope changing points of the GCM of V c . For K > 0, define 
ggpiO) = LSoGCM{X a ^(t):t G [-K,K]}(0) and Sf = LS o GCM{V c (t) : 
te[-K,K]}(0). 

We will show that, given e > 0, there exist M e > and c(e) such that (a) 
for all < c < c(e), P(S^ ^S c )<e and (b) P(g^(0) + g a ,p(0)) < e. These 



and 



immediately imply that both Fact 1 : lim^o nm su Pe->o P{S^ E 7^ S c 
Fact 2: \im E ^ o P(g^(0) / 9aA )) = hold - We then show that Fact 3: for 
each e > 0, 4- 5^(0) holds as well. Then, by Lemma 3.9, we have the 

conclusion S c -4 g a ,is{^)- Figure 3 illustrates the following argument. 

Let r_2 < T-i < t\ < T2 be four consecutive slope changing points of 
G a ,p = GCM{X Qj/ g} with r_i denoting the first slope changing point to the 
left of and t\ the first slope changing point to the right. Since r_2 and ti 
are Op(l), given e > 0, there exists M £ > such that P(—M e < t_2 < T2 < 
M e ) > l-e/4. Note that the event {^(0) = g a ,p(0)} C {-M £ < r_ 2 < r 2 < 

M e }, and it follows that P(gJJ§(0) / g a A )) < e/4 < e. Thus, (b) holds. 

Next, consider the chord C\(t) joining (0, G a>J g(0)) and (r_2, G a , )i g(r_2)). 
By the convexity of G^,/? over [r_2, 0] and r_i € (t_2, 0) being a slope chang- 
ing point, X Qi/9 (r_i) = G a ,^(r_i) < C(r_i). But d(0) = G a> ^(0) < X a ^(0), 
and it follows by the intermediate value theorem that £ = inf r _ 1< t < o{t : 
X a pit) = Ci(t)} is well defined (since the set in question is nonempty), r_i < 
£ < 0, Ci(e) = X atP (£) and on [r_!,£), X a ,p(t) < d(t). Let V = £ - r_i. 
Since V is a continuous and positive random variable, there exists <5(e) > 
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such that P(V > 5(e)) > l-e/4. Then, the event E £ = {V > 5{e)}n{-M £ < 
t_ 2 } has probability larger than 1 — e/2. For any c < c(e) =: (5(e), we claim 
that L c > r_2 on the event E E , and the argument for this follows below. 

If L c < r_2, consider the chord C2(t) connecting two points (L C ,V2, C (L C )) 
and (U c ,V2,c(U c )) ■ This chord must lie strictly above the chord {C\(t) : r_i < 
t < 0} since it can be viewed as a restriction of a chord connecting two points 
{ti,G a> p(ti)) and (£ 2 , G a ^(t 2 )) with £i < L c < r_i < < C/ c < i 2 . It then 
follows that all points of the form {ck,V2,c( c k) = X a ^(ck):ck G [L C ,J7 C ]} 
must lie above C 2 (i). But there is at least one ck* with r_i < c/c* < £ and 
such that X„ j( g(cA:*) < C±(ck*) < C2(ck*), which furnishes a contradiction. 

We conclude that for any c < c(e), P(—M £ < L c ) > 1 — e/2. A similar 
argument to the right-hand side of shows that for the same c's (by the 
symmetry of two-sided Brownian motion about the origin), P(U C < M e ) > 
1 - e/2. Hence P(—M £ < L c < U c < M £ ) > 1 - e. On this event, clearly 
= S c , and it follows that for all c < c(e), P(S^ / S c ) < e. Thus, (a) 
also holds and Facts 1 and 2 are established. 

It remains to establish Fact 3. This follows easily. For almost every u, 
X a ,f}(t) is uniformly continuous on [±2M e ]. It follows by elementary analysis 
that (for almost every co) on [±M e ], the process V c , being the linear inter- 
polant of the points {ck, X a ^(ck) : -M e < ck < M £ ] U {(-M £ ,-p 2c (-M e )), 
(M £ ,V2,c(M E ))} , converges uniformly to X a ^ as c— >0. Thus, the left slope 
of the GCM of {V c (t):t G [±M e ]}, which is precisely S^' Ie , converges to 
Sethi®) s i nce the GCM of the restriction of X a ^ to [±M e ] is almost surely 
differentiable at 0; see, for example, the Lemma on page 330 of Robertson, 
Wright and Dykstra (1988) for a justification of this convergence. □ 
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SUPPLEMENTARY MATERIAL 

More proofs for the current paper "Likelihood based inference for current 
status data on a grid: A boundary phenomenon and an adaptive inference 
procedure" (DOI: 10.1214/11-AOS942SUPP; .pdf). The supplementary ma- 
terial contains the details of the proofs of several theorems and lemmas in 
Sections 3.1 and 3.3 of this paper. 
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